gSearch: a fast and flexible general search tool for whole-genome sequencing

نویسندگان

  • Taemin Song
  • Kyu Baek Hwang
  • Michael Hsing
  • Kyungjoon Lee
  • Justin Bohn
  • Sek Won Kong
چکیده

BACKGROUND Various processes such as annotation and filtering of variants or comparison of variants in different genomes are required in whole-genome or exome analysis pipelines. However, processing different databases and searching among millions of genomic loci is not trivial. RESULTS gSearch compares sequence variants in the Genome Variation Format (GVF) or Variant Call Format (VCF) with a pre-compiled annotation or with variants in other genomes. Its search algorithms are subsequently optimized and implemented in a multi-threaded manner. The proposed method is not a stand-alone annotation tool with its own reference databases. Rather, it is a search utility that readily accepts public or user-prepared reference files in various formats including GVF, Generic Feature Format version 3 (GFF3), Gene Transfer Format (GTF), VCF and Browser Extensible Data (BED) format. Compared to existing tools such as ANNOVAR, gSearch runs more than 10 times faster. For example, it is capable of annotating 52.8 million variants with allele frequencies in 6 min. AVAILABILITY gSearch is available at http://ml.ssu.ac.kr/gSearch. It can be used as an independent search tool or can easily be integrated to existing pipelines through various programming environments such as Perl, Ruby and Python.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

I-20: Towards The Transparent Embryo: Dynamics and Ethics of Comprehensive Preimplantation Genetic Screening

Background: To study the ethical aspects of comprehensive preimplantation genetic screening (PGS) through microarrays and whole genome sequencing Materials and Methods: In order to pinpoint ethical issues regarding comprehensive embryo screening we have first investigated the technical and moral issues by organizing a campus meeting with experts and by a literature study. Subsequently we have i...

متن کامل

Finding Syntactic Structure in Unparsed Corpora The Gsearch Corpus Query System

The Gsearch system allows the selection of sentences by syntactic criteria from text corpora, even when these corpora contain no prior syntactic markup. This is achieved by means of a fast chart parser, which takes as input a grammar and a search expression specified by the user. Gsearch features a modular architecture that can be extended straightforwardly to give access to new corpora. The Gs...

متن کامل

SciRoKo: a new tool for whole genome microsatellite search and investigation

UNLABELLED SciRoKo is a user-friendly software tool for the identification of microsatellites in genomic sequences. The combination of an extremely fast search algorithm with a built-in summary statistic tool makes SciRoKo an excellent tool for full genome analysis. Compared to other already existing tools, SciRoKo also allows the analysis of compound microsatellites. AVAILABILITY free for us...

متن کامل

Development of a Tool for Copy Number Analysis of Cancer Genomes using High Throughput Sequencing Data

Genomic copy number alterations (CNA) and loss of heterozygozity (LOH) are two types of genomic instabilities associated with cancer. Acquisition of these genomic instabilities affects the expression level of oncogenes and tumor suppressor genes. Thus, accurate detection of these abnormalities is a crucial step in identifying novel oncogenes and tumor suppressor genes. Whole-genome sequencing o...

متن کامل

Whole-Genome Sequencing of a Clinically Isolated Antibiotic-Resistant Enterococcus faecium EntfacYE

Background and Objective: Enterococcal infections are considered the most common nosocomial infections. Nowadays, enterococci show high resistance to common antibiotics, especially vancomycin. Vancomycin-resistant Enterococcus faecium is one of the most common nosocomial infections, which is included in the World Health Organization priority pathogens list for research and development of new an...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Bioinformatics

دوره 28 16  شماره 

صفحات  -

تاریخ انتشار 2012